ABSTRACT
The majority prognostic scores proposed for early assessment of coronavirus disease 19 (COVID-19) patients are bounded by methodological flaws. Our group recently developed a new risk score - ABC 2 SPH - using traditional statistical methods (least absolute shrinkage and selection operator logistic regression - LASSO). In this article, we provide a thorough comparative study between modern machine learning (ML) methods and state-of-the-art statistical methods, represented by ABC 2 SPH, in the task of predicting in-hospital mortality in COVID-19 patients using data upon hospital admission. We overcome methodological and technological issues found in previous similar studies, while exploring a large sample (5,032 patients). Additionally, we take advantage of a large and diverse set of methods and investigate the effectiveness of applying meta-learning, more specifically Stacking, in order to combine the methods' strengths and overcome their limitations. In our experiments, our Stacking solutions improved over previous state-of-the-art by more than 26% in predicting death, achieving 87.1% of AUROC and MacroF1 of 73.9%. We also investigated issues related to the interpretability and reliability of the predictions produced by the most effective ML methods. Finally, we discuss the adequacy of AUROC as an evaluation metric for highly imbalanced and skewed datasets commonly found in health-related problems.
Subject(s)
COVID-19 , Coronavirus InfectionsABSTRACT
Background: It is not clear whether previous thyroid diseases influence the course and outcomes of COVID-19. The study aims to compare clinical characteristics and outcomes of COVID-19 patients with and without hypothyroidism. Methods: The study is a part of a multicentric cohort of patients with confirmed COVID-19 diagnosis, including data collected from 37 hospitals. Matching for age, sex, number of comorbidities and hospital was performed to select the patients without hypothyroidism for the paired analysis. Results: From 7,762 COVID-19 patients, 526 had previously diagnosed hypothyroidism (50%) and 526 were selected as matched controls. The median age was 70 (interquartile range 59.0-80.0) years-old and 68.3% were females. The prevalence of underlying comorbidities were similar between groups, except for coronary and chronic kidney diseases, that had a higher prevalence in the hypothyroidism group (9.7% vs. 5.7%, p=0.015 and 9.9% vs. 4.8%, p=0.001, respectively). At hospital presentation, patients with hypothyroidism had a lower frequency of respiratory rate > 24 breaths per minute (36.1% vs 42.0%; p=0.050) and need of mechanical ventilation (4.0% vs 7.4%; p=0.016). D-dimer levels were slightly lower in hypothyroid patients (2.3 times higher than the reference value vs 2.9 times higher; p=0.037). In-hospital management was similar between groups, but hospital length-of-stay (8 vs 9 days; p=0.029) and mechanical ventilation requirement (25.4% vs. 33.1%; p=0.006) were lower for patients with hypothyroidism. There was a trend of lower in-hospital mortality in patients with hypothyroidism (22.1% vs. 27.0%; p=0.062). Conclusion: In this large Brazilian COVID-19 Registry, patients with hypothyroidism had a lower requirement of mechanical ventilation, and showed a trend of lower in-hospital mortality. Therefore, hypothyroidism does not seem to be associated with a worse prognosis, and should not be considered among the comorbidities that indicate a risk factor for COVID-19 severity.
Subject(s)
COVID-19 , Thyroid Diseases , Renal Insufficiency, Chronic , HypothyroidismABSTRACT
Objective: To provide a thorough comparative study among state ofthe art machine learning methods and statistical methods for determining in-hospital mortality in COVID 19 patients using data upon hospital admission; to study the reliability of the predictions of the most effective methods by correlating the probability of the outcome and the accuracy of the methods; to investigate how explainable are the predictions produced by the most effective methods. Materials and Methods: De-identified data were obtained from COVID 19 positive patients in 36 participating hospitals, from March 1 to September 30, 2020. Demographic, comorbidity, clinical presentation and laboratory data were used as training data to develop COVID 19 mortality prediction models. Multiple machine learning and traditional statistics models were trained on this prediction task using a folded cross validation procedure, from which we assessed performance and interpretability metrics. Results: The Stacking of machine learning models improved over the previous state of the art results by more than 26% in predicting the class of interest (death), achieving 87.1% of AUROC and macroF1 of 73.9%. We also show that some machine learning models can be very interpretable and reliable, yielding more accurate predictions while providing a good explanation for the why. Conclusion: The best results were obtained using the meta learning ensemble model Stacking. State of the art explainability techniques such as SHAP values can be used to draw useful insights into the patterns learned by machine-learning algorithms. Machine learning models can be more explainable than traditional statistics models while also yielding highly reliable predictions. Key words: COVID-19; prognosis; prediction model; machine learning
Subject(s)
COVID-19 , Learning Disabilities , DeathABSTRACT
Objective: To develop and validate a rapid scoring system at hospital admission for predicting in-hospital mortality in patients hospitalized with coronavirus disease 19 (COVID-19), and to compare this score with other existing ones. Design: Cohort study Setting: The Brazilian COVID-19 Registry has been conducted in 36 Brazilian hospitals in 17 cities. Logistic regression analysis was performed to develop a prediction model for in-hospital mortality, based on the 3978 patients that were admitted between March-July, 2020. The model was then validated in the 1054 patients admitted during August-September, as well as in an external cohort of 474 Spanish patients. Participants: Consecutive symptomatic patients ([≥]18 years old) with laboratory confirmed COVID-19 admitted to participating hospitals. Patients who were transferred between hospitals and in whom admission data from the first hospital or the last hospital were not available were excluded, as well those who were admitted for other reasons and developed COVID-19 symptoms during their stay. Main outcome measures: In-hospital mortality Results: Median (25th-75th percentile) age of the model-derivation cohort was 60 (48-72) years, 53.8% were men, in-hospital mortality was 20.3%. The validation cohorts had similar age distribution and in-hospital mortality. From 20 potential predictors, seven significant variables were included in the in-hospital mortality risk score: age, blood urea nitrogen, number of comorbidities, C-reactive protein, SpO2/FiO2 ratio, platelet count and heart rate. The model had high discriminatory value (AUROC 0.844, 95% CI 0.829 to 0.859), which was confirmed in the Brazilian (0.859) and Spanish (0.899) validation cohorts. Our ABC2-SPH score showed good calibration in both Brazilian cohorts, but, in the Spanish cohort, mortality was somewhat underestimated in patients with very high (>25%) risk. The ABC2-SPH score is implemented in a freely available online risk calculator (https://abc2sph.com/). Conclusions: We designed and validated an easy-to-use rapid scoring system based on characteristics of COVID-19 patients commonly available at hospital presentation, for early stratification for in-hospital mortality risk of patients with COVID-19.